Scaling properties of the Penna model 
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We investigate the scaling properties of the Penna model, which has become a popular tool for 
the study of population dynamics and evolutionary problems in recent years. We find that the 
model generates a normalised age distribution for which a simple scaling rule is proposed, that is 
able to reproduce qualitative features for all genome sizes. 
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I. INTRODUCTION 

In the last years, the usage of computational models 
has turned into a major trend in the discussion of prob- 
lems in population dynamics and evolutionary theory. 
One of the reasons for this choice is undoubtedly the 
lack of substantial amounts of observational data on the 
dynamics of such systems; another, is the ability of com- 
putational models in mapping the dynamics of a non- 
Hamiltonian system into a set of simple rules of inter- 
action between the large number of its individual con- 
stituents. Simulations of populations evolving under this 
set of rules serve as grounding test for the theoretical 
ideas that inspired them. The outcome of these simu- 
lations can then provide support for the role played by 
each particular conjecture, thus helping the theorist in 
providing guidelines for her or his work. 

Statistical physicists have pioneered this effort, and 
their toolbox has proven its value in a number of dif- 
ferent problems - see Ref. [1] for recent reviews. Among 
the different models that have been used by physicists in 
the field, one stands out for its popularity. The Penna 
model [2] owes its leading role to a number of successes, 
and has further more managed to attract the attention 
of some theoretical biologists [3]. Despite - or perhaps 
because of - its simplicity, it has shown enough power 
to unravel the key factors involved in such phenomena 
as the catastrophic senescence of semelparous species, fe- 
male menopause and species branching under ecological 
pressure. 

In the Penna model, individuals are represented by 
their genome, mapped onto one (haploid version) or two 
(diploid version) bit-strings. The standard genome used 
in the Penna model is 32-bit long, by no other reason than 
to turn it easy to implement on 32-bit word processors. 
In a study of the mortality data of the German popu- 
lation with the Penna model, genomes were represented 
by 128-bit long strings [4]. There, it was shown that it 
is possible to compare results for two different genome 
sizes by effecting a rescaling of some parameters. This 



result motivated the search for scaling in general, but 
a first proposal in this direction [5] was not conclusive. 
That computer simulation used an asexual model with 
a classical Verhulst factor and tried to compare directly 
results with different rescaled parameters. Another ver- 
sion of the asexual Penna model, continuous in time and 
using a real- valued genotype, was also object of a similar 
analysis [6], but its results are not easily mapped onto 
the usual discrete version. Our approach, as can be seen 
in the following, is quite different. 



II. A PROPOSAL FOR SCALING ANALYSIS 

We are interested in studying the sensitivity of the 
Penna model for diploid individuals, that use sex for re- 
production, with respect to the number of bits used in 
the implementation of the age-structured genetic load, 
and we focus on the analysis of the age distribution of 
the population. 

In the Penna model, each position (locus) of the 
genome may contain a bit set to 1 (harmful allele) or 
0. The passage of time in an individual's life triggers the 
activation of one further allele in the sequentially read 
bit-string. The amount of active harmful alleles deter- 
mine the genetic death of the individual when it reaches 
some pre-determined threshold value. An individual may 
also die because of intra-specific competition for resources 
of the environment, and this is usually represented by 
a density-dependent mean-field death probability, called 
the Verhulst factor. A Fortran code that simulates the 
model, and was the basis for our own simulations, can be 
found in Ref. [7]. Because the genome is age-structured, 
from a physical point of view we are studying the prop- 
erties of the model's temporal scaling. The biological 
aspect of our analysis is to provide an answer to the ques- 
tion whether the model shows dependence on the genome 
size. 

As a first step, we have looked for the model version 
more suited to the analysis and chose a variation of the 
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version with a Verhulst factor that operates only on the 
first time step of an individual's life [8] . Since this usage 
of the Verhulst factor is equivalent to setting the repro- 
duction probability dependent on the population size, we 
made it explicitly by letting a female give birth with a 
probability given by 1 minus the Verhulst factor. With 
this choice, we are able to control the population in a 
way that is not dependent on the genome length, due to 
the fact that a living individual never feels the effect of 
an external death factor. In the version with the usual 
Verhulst strategy, each living individual has an yearly 
probability to die, and the effective non-genetic death 
probability is trivially dependent on the genome size. 

In Fig. 1 we show the age distribution of the popu- 
lation for various simulations of the model, differing by 
the number of bits in the genome. In the figure we are 
showing just the normalised values of the age distribu- 
tions, forgetting the fact that the population grows with 
genomes' elongation. The first model uses a string of 
32 bits. Because 32 is a natural unit for these compu- 
tational studies running on 32-bit word processors, the 
other bit strings are chosen with sizes multiple of this 
number: 64, 96, 192 and 224 bits. In all the simulations 
performed, the parameters that control the number of 
dominant loci, the age at which reproduction starts, the 
number of offspring and mutations in each generation, 
and the threshold of harmful mutations are exactly the 
same. It is also relevant to notice that the cross-over fre- 
quency during gamete production is always one in each of 
these simulations. The end of reproduction age is differ- 
ent for each genome length, in each case set to correspond 
to the maximum allowable age of the individuals (32, 64, 
96, 192 or 224). In fact, our simulation conditions per- 
mit autosustaining populations that at equilibrium are 
not very sensitive to a change of this parameter. So, our 
choice does not introduce any undesirable asymmetry. 
We can see that the distributions, although qualitatively 
similar, undergo a visible differentiation. We were led by 
their aspect to look for a scaling law that gives a rela- 
tion between two different ages (t\ and tz) at which the 
integral of two distributions corresponding to different 
genome sizes (pi(t) and P2(t)) reach the same popula- 
tion value. Formally, we search for a temporal rescaling 
h = F(ti) that solves the equation 

rti rt 2 

/ dxp 1 (x) = / dx p 2 (x) (2.1) 
Jo Jo 

The solution turns out to be a very simple linear re- 
lation. In all cases, the integral of the distribution, as a 
function of its upper limit, starts with a linear growth, 
at ages where the distribution is essentially constant, and 
ends with a saturation, at the end of the lifespan of the 
population. This behaviour suggests that a linear re- 
lation between the time values may satisfy the integral 
equality: If yi = a(i) + b(i)U is a regression of the lin- 
ear part of the integral function of the distribution pi (t) , 
yi = yj leads to the relation we are looking for: 



tj = (b(i)/b(j)) ■ U + ( aW Ki) a(j) ) = c(i,j) ■ U + d(i,j) 

(2.2) 

Each index, i or j, is defined as the bit string size 
divided by 32. We can determine the coefficients of this 
rescaling relation by performing the regression of each 
integral function and using the above derived formulae 
to compute c(i,j) and d(i,j). For simplicity of notation, 
we omit the first index if it is equal to 1. Table I shows 
results for these coefficients for some values of j and for 
i = I. These simple transformation relations allow both 
a proper rescaling of the full integral functions, and not 
only of its linear part, and also a rescaling of the age dis- 
tributions. In fact, if we perform the inverse of the time 
transformations with the coefficients in Table I and then 
renormalise the rescaled functions, we obtain results that 
are close to the 32-bits distribution from all the others 
(See Fig. 2). 

From the coefficients listed in Table I it is possible to 
suggest a simple approximation for the slopes of the time 
rescaling transformations: 

c(j) ~ [1 + 0.5 -(i-1)]. (2.3) 

These coefficients are physically related to the model's 
temporal scaling, as already pointed out. A similar rela- 
tion also holds for the terms d(j), which are obtained as a 
difference between the constant terms of the regressions 
of the integral functions rescaled by a slope, and thus 
depend on the values of the age distributions at zero age. 

We now compare the mortality functions, derived from 
the age distributions by the equation 

/(a)=log(p(o)/p(a + l)), (2.4) 

where p(a) is the value of the distribution at age a. 

In Fig. 3 these functions are plotted, after having 
rescaled the age distributions. In a linear scale, these 
functions appear to collapse for young ages, and they 
diverge clearly at the large age end. The inset, on a log- 
linear scale, shows that the mortality functions have the 
same general behaviour in the small age interval shown, 
but the plot shows an increasing separation between the 
smaller and larger genomes. The collapse is not fully 
obtained, as can be seen with the help of the error bars 
shown. 

The slope of the scaling transformation is obviously 
strongly dependent on the values of the simulation pa- 
rameters. Of particular interest is a choice of these pa- 
rameters that leads to a unit slope. In this case, we may 
recover the solution of the 32-bits model from the oth- 
ers just by rescaling those parameters, which amounts to 
performing a renormalisation. To explore this alternate 
path, we have focused our attention on just two of the 
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parameters, namely the number of dominant loci for the 
harmful allele and the number of mutations added in each 
generation. The guideline here was to keep constant the 
density of mutations and dominant loci in the genomes, 
independently of their size. We only need to multiply the 
original values of these parameters in the 32-bits model 
by i, the genome size divided by 32. The results of this 
renormalisation procedure are shown in Fig. 4. 

III. CONCLUSION 

From the results of our numerical simulations it 
emerges that, given a Penna model with a Verhulst factor 
acting a single time in each individual's life, the scaling 
laws: 
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32 — > N = 32 • j 

[1 + 0.5- (7-1)]*+ [0.5- (7-1)] 
p — ► [1 + 0.5 • (j - 

where N is the number of bits in the genome and j 
an integer, lead to age distribution functions (p = p(t)) 
that have similar behaviours, although they do not agree 
quantitatively for all genome sizes. This fact allows one 
to use any genome size in a simulation, if only qualitative 
features are focused, from which the age distribution for 
all other sizes can be roughly derived. It is also known 
that the situation is no more clearer if the threshold for 
harmful mutations is scaled in proportion to the genome 
size [9]. 

As a final comment, our results seem to indicate that 
the onset of ageing, usually considered as coincident 
with the minimum reproduction age, is now, for large 
genomes, deferred. The age distributions do show a de- 
creasing trend, starting close to the onset of reproduc- 
tion, but they have very small derivatives - reflected on 
the plateau at small ages for the mortality functions. The 
lifespan of the population increases linearly with genome 
size, as opposed to being strongly dependent only on the 
minimum reproduction age. The latter prediction is usu- 
ally considered to be a trivial consequence of the muta- 
tion accumulation theory on which the Penna model is 
based. These results are somewhat intriguing and de- 
serve further investigation. 
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FIG. 1. Age distribution of the population for a 32, 64, 96, 
192 and 224 bits models. The parameter used in the simula- 
tions are: the Verhulst parameter (400000), the initial popula- 
tion (1000), the minimum reproduction age (8), the number 
of offspring per mating season (4), the threshold value for 
harmful diseases (3) , the number of mutations added at birth 
per bit string (1) and the number of dominant loci (6). We 
have averaged over the last 1000 steps of 10 different reali- 
sations in each case, after all statistical distributions could 
be confidently considered as stationary (simulations end after 
between 50000 and 200000 steps, depending on the size of the 
bit strings). 
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FIG. 2. By transforming the time scale of the distributions 
for genome sizes that are multiples of 32 using the inverse 
of the transformations with coefficients given in Table I, and 
then normalising them, it is possible to approach the 32-bits 
simulation from any of the others. 
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FIG. 3. The mortality functions computed from the 
rescaled age distributions (Fig. 2) . The inset shows the same 
functions in a semi-logarithm scale, for ages up to 15. Typical 
error bars are shown for three of the points. 



TABLE I. Coefficients c and d obtained from regressions 
of the integral function, for i = 1 and several values of j. 
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FIG. 4. The different simulations when the number of mu- 
tations and dominant loci are renormalised depending on the 
string size to keep constant their density in the genomes. All 
the simulations have a duration of 50000 Monte Carlo steps. 
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